Grounded Discovery of Coordinate Term Relationships between Software Entities

نویسندگان

  • Dana Movshovitz-Attias
  • William W. Cohen
چکیده

We present an approach for the detection of coordinateterm relationships between entities from the software domain, that refer to Java classes. Usually, relations are found by examining corpus statistics associated with text entities. In some technical domains, however, we have access to additional information about the real-world objects named by the entities, suggesting that coupling information about the “grounded” entities with corpus statistics might lead to improved methods for relation discovery. To this end, we develop a similarity measure for Java classes using distributional information about how they are used in software, which we combine with corpus statistics on the distribution of contexts in which the classes appear in text. Using our approach, cross-validation accuracy on this dataset can be improved dramatically, from around 60% to 88%. Human labeling results show that our classifier has an F1 score of 86% over the top 1000 predicted pairs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تحلیل داده ها در روش تحقیق نظریه پایه

Grounded Theory is a qualitative research approach used to explore the social processes that present within human interactions. Glaser and Strauss (1967) developed the method and published the first text addressing method issues. Grounded theory includes systematic techniques and procedures of analysis that enable the researcher to develope a substantive theory. The discovery of a core varia...

متن کامل

Using Dependency Parsing and Probabilistic Inference to Extract Relationships between Genes, Proteins and Malignancies Implicit Among Multiple Biomedical Research Abstracts

We describe BioLiterate, a prototype software system which infers relationships involving relationships between genes, proteins and malignancies from research abstracts, and has initially been tested in the domain of the molecular genetics of oncology. The architecture uses a natural language processing module to extract entities, dependencies and simple semantic relationships from texts, and t...

متن کامل

On the autonomy of software entities and modes of organisation

As software becomes more complex and needs to operate in more open environments, the relationships between the encapsulated entities that constitute the software can become nondeterministic. In a number of branches of computer science, organisational mechanisms and structures have been seen as a way to coordinate the complex behaviour between software entities. In particular, organisational abs...

متن کامل

Seeded Discovery of Base Relations in Large Corpora

Relationship discovery is the task of identifying salient relationships between named entities in text. We propose novel approaches for two sub-tasks of the problem: identifying the entities of interest, and partitioning and describing the relations based on their semantics. In particular, we show that term frequency patterns can be used effectively instead of supervised NER, and that the pmedi...

متن کامل

Grounding Spatial Named Entities For Information Extraction And Question Answering

The task of named entity annotation of unseen text has recently been successfully automated with near-human performance. But the full task involves more than annotation, i.e. identifying the scope of each (continuous) text span and its class (such as place name). It also involves grounding the named entity (i.e. establishing its denotation with respect to the world or a model). The latter aspec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1505.00277  شماره 

صفحات  -

تاریخ انتشار 2015